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Abstract 

This paper provides a short and transparent solution for the covering cost of white-grey trees 
which play a crucial role in the algorithm of Bergeron et al. to compute the rearrangement dis- 
tance between two multichromosomal genomes in linear time {Theor. Comput. ScL, 410:5300- 
5316, 2009). In the process it introduces a new center notion for trees, which seems to be 
interesting on its own. 
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1. Introduction 

Computational comparative genomics is a subdiscipline of computational biology in which the 
relationships between two or more genomes are studied by computational means. A highly rele- 
vant question in this field is the calculation of the minimum number of rearrangement operations 
(reversals, translocations, fusions and fissions) that are necessary to transform one given genome 
into another one, the so-called genome rearrangement problem fill. 

The white-grey tree cover problem studied in this paper (formally defined in Section|2| arises 
as a subproblem of the genome rearrangement problem, and so far only an unsatisfactory (and 
not self-contained) solution exists The main goal of this paper is to give a short solution of 
the problem and to correct some omissions and discrepancies of the original formulation. (In 
Section m we point out some cases where the original formulation fails.) Moreover, it gives rise 
to a combinatorial problem on trees, detailed in Section[3] that seems to be interesting on its own. 
Since one of our main concerns here is brevity, we (usually) don't give detailed proofs of easy 
facts, which are not essential for our main goal. 
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2. Problem definition 



A white-grey tree is a rooted tree with (white or grey) colored and uncolored vertices. The root 
is uncolored, some children of the root are grey (some of them can be leaves), all leaves which 
are not children of the root are white. All uncolored vertices (with the possible exception of the 
root) are branching points. 

A system of paths in a white-grey tree is a colored cover if: 

(i) Each path has colored endpoints. One vertex alone may constitute a path. 

(ii) Each colored vertex is covered with path(s). 

The cost of a path P is denoted by cost(P) and is defined as follows; 

(i) P is short if it has exactly one vertex. Then cost(P) = 1 . 

(ii) P is grey if its endpoints are grey vertices (then the third vertex is the uncolored root). Then 
cost(P) = 1. 

(iii) P is long otherwise. Its cost is cost(f ) = 2. 

Definition 1. The cost of a path system P is the sum of the individual costs: cost(!P) := Y^p^p cost(P). 
A colored cover P is an optimal one for a given white-grey tree T if it has minimal cost among 
all possible colored covers, denoted by cost(r). 

Problem 2 (White-grey tree cover problem). Given a white-grey tree T, compute cost(r). 

The main result of this paper is a simple way to calculate the exact cost of an optimal cover. 
We are not quite ready to formalize the main result (without some further observations it would 
require a detailed case analysis), but we mention here a well known fact 111]: 

Lemma 3. Let T be a white-grey tree with w white and g grey leaves, then: 

w + \gl2'\ < cost(r) < w + r^/21 + 1. □ 

3. Balanced vertices in trees 

In this section we prove a useful tool for (unrooted) trees which seems to be interesting in its 
own. In tree T' denote by P„ the unique path between vertices « and v. 

Theorem 4. Let T' be a tree with 2n leaves. Then there exists a vertex v € V{T') and a bijection 
among the leaves a : X, X, such that the path system Pe,a(e) (where f e £,) covers each vertex 
in T' and all paths contain v. 

We offer here two proofs. One gives a very simple algorithm to construct such a cover, but 
it clearly cannot provide all possible solutions. The second proof is based on a necessary and 
sufficient reformulation of the statement. 

First proof. Consider an embedding of our tree into the plane and enumerate the leaves in a 
counter clockwise fashion. One way to obtain such a numbering is to fix a leaf as a root and take 
the left-to-right, depth first traversal of the tree which conforms with the embedding. 

Now we define our bijection with the formula a : {j ^,+„ mod 2n. Considering any two 
such paths, their endpoints alternate along the circle which contains the leaves in increasing 
order. Therefore these two paths clearly intersect each other. 
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As it is well known (its very first proof is due to Gyarfas and Lehel, istl) if (in a tree) a set 
of paths does not contain two disjoint paths, then all the paths share a common vertex v. And 
because these paths connect v to the leaves, they cover all edges of the tree. □ 

If T' is a fully balanced tree, then no matter what is the embedding in the previous proof, 
two close leaves will be paired with two close leaves. Therefore there are clearly solutions which 
cannot be obtained with the previous method. In the remaining part of this section we sketch a 
proof which is able to find all possible solutions: 

Second proof. For each vertex-edge pair (v, e) denote by d(v, e) the number of leaves E in T' such 
that Py [ contains the edge e (where v € V{T'), e G ^(r') and v E e). Furthermore, denote by 
^(v) the set of edges that contain v. 

In the configuration required by TheoremHl vertex v clearly satisfies the inequality: 

Vee£(y) : c5(v,e) < ^ {^(v,/) : / e £(v),/ ^ e}. (1) 

Such a vertex v E T' is called a balanced vertex. (If a vertex-edge pair does not satisfy this in- 
equality, then the pair is called oversaturated) As a matter of fact, this property is just equivalent 
to the existence of the required cover: 

Lemma 5. Let T' be a tree with 2n leaves, n > I. Then for any balanced vertex v there exists a 
bijection a such that the paths P{,a(i) cover all edges, and all paths contain v. 

The easy proof is left to the diligent reader (One can argue, for example, with mathematical 
induction.) A balanced vertex in a tree is similar to the well-known notion center of the tree, but 
while a center is usually (almost) unique, there may be several balanced vertices. 
The following observation completes our second proof of Theorem|4] 

Lemma 6. Any tree T' with an even number of leaves contains a balanced vertex. 

Proof. (Sketch) Assume that a particular vertex v is not balanced. Then there exists an edge 
e G E(v) such that the pair (v, e) is oversaturated. We repeat the process with the other end of 
that edge. If this vertex is not balanced again, then it will provide another oversaturated pair. The 
finiteness of the graph finishes the proof of the Lemma and this also completes the second proof 
of TheoremlD □ 

The flexibility in the pairing algorithm clearly can provide any possible bijection a. It is also 
interesting to recognize that one can find a suitable balanced vertex quickly: 

Lemma 7. Let T' be a tree with 2n leaves. Then there is a linear (in the number of leaves) time 
algorithm to find a balanced vertex in T' . 

This proof is left to the reader again. A simple dynamic programming algorithm suffices. □ 
4. Optimal colored covers 

We are ready to determine the cost of an optimal cover for the white-grey tree T. We say a path 
in the cover is a mixed path if it contains at least two colored vertices, exactly one of the colored 
vertices is a grey leaf. We will use the notation T„ for the subtree derived from T by deleting 
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all grey leaves and their edges, and the root if it would become a leaf. Furthermore for a path P 
in T we will use the notation P \ T„ to denote the trace of P on r„„ i.e. the restriction of P to 
the nodes of T,,. with the extra condition that in the truncated path we delete the starting (if any) 
uncolored vertices. We extend this notation to the trace of a path system 'P,'P \ T^- 

Our general strategy to determine an optimal colored cover is to build it from an optimal 
colored cover of the subtree T,, . To do that we are going to exploit certain properties - described 
in the following result - of optimal solutions having a minimum number of mixed paths. 

Theorem 8. Every white-grey tree T has an optimal colored cover V such that 

(1) f contains at most 2 mixed paths, 

(2) f t T„ is an optimal cover o/r,,., 

(3) for each mixed path P e'P, cost(f ) = cost(P I T„) and so P t T^^ is either a long path, or a 
short path consisting of a single grey leaf. 

Proof. (1) Let P be an optimal cover with a minimum number of mixed paths. Assume on the 
contrary that f contains three mixed paths: P\,P2 and f 3, where P, is a path from the grey leaf 
gi to the colored vertex c, e T„. (If two paths cover the same grey leaf then deleting that leaf 
from one of the paths decreases the number of the mixed paths in the cover So we may assume 
that the grey leaves are pairwise distinct.) 

Let the path P be the intersection of the paths Pi , Po, P3. Clearly P is a path from the root 
to some c 6 T^.. (It is clear that c may be the root itself). Since c is the "last" point of the 
intersection, we can assume that the unique sub-paths P^.c and P^^.c are edge disjoint (and of 
course vertex-disjoint except vertex c). Then replace the paths {Pi, P2, P3) in !P with the paths 
{Pg,_g,, Pc,,e,, P3} to obtain a path cover P'. But costCP') < costCP) and P' contains less mixed 
paths than !P - a contradiction. 

(2) So we have an optimal cover which contains at most two mixed paths. If its trace is not 
optimal then consider the following cover Q: cover optimally (this has cost at least 1 smaller 
than the trace of the original cover had), keep the paths from P which do not intersect T^, and 
finally cover the (at most two) grey vertices that were covered by the mixed path(s) with a path 
whose cost is 1 . Then the cost of Q is less than or equal to the cost of P, and Q does not contain 
any mixed path. 

(3) Assume that 2 - cost(P) > cost(P [ T„) = 1 for a mixed path P e P. The restriction 
P I r„ should be a short white path covering vertex u, and P is a path P„_j = (m, root, g) for a 
grey leaf g. Replacing the path P with two short paths covering u and g resp. keeps the cost of 
the cover, but decreases the number of mixed paths. □ 

A cover P is nice iff it satisfies the requirements of Theorem [8] Let P be a path in T^,. We 
say that path P is free iff P can be extended to path P' such that P' contains a grey leaf while 
cost(P) = cost(P') holds. Theorem[8]implies the following statement: 

Lemma 9. Assume that T is a white-grey tree which has g grey leaves. 




where f is the maximal number of free paths in a nice optimal cover ofT^ 



□ 
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Next we should solve the white-grey tree cover problem for the subtree T„. Therefore we first 
solve the problem for trees where (essentially) all leaves are white. In what comes, we will say a 
leaf is short if it is adjacent to a branching vertex. 

Lemma 10. Let T' be a white-grey tree with w colored leaves but without a grey vertex or with 
exactly one grey leaf. Then the minimal cost of a colored cover is: 



Proof. Since we have at most one grey leaf, we can not use a "cheap" grey-grey path to cover 
it. So we can change the color of that vertex into white without changing the cost of the tree and 
thus assume that all leaves are white. 

If the number of leaves is even, then the result is a direct consequence of Lemma [3] and 
TheoremlH If the number of leaves is odd, but there is a short leaf, then we cover that leaf with 
a short path. Deleting it from the tree we are back to the previous case. 

Finally assume that w is odd but cost(r') = w. Then each leaf is covered once in an optimal 
cover, and one of them is covered by a short path. If this leaf is not a short one, however, 
then its colored neighbor is not covered, a contradiction. For simplicity we fix; in this case the 
constructed optimal cover contains a long path which does not cover any branching vertex. This 
path will be called a half-path. □ 

Let's remark that Lemma [TOl for white-only trees is certainly not new: actually it was proved 
as early as 1995 (see Uj). But the consideration of more general white-grey trees raises several 
problematic issues. One of them is that in the literature, known to the authors, grey vertices 
which are not leaves have not been studied. However, the white-grey trees are constructed in 
connection with the genome rearrangement problem (||2l) and grey vertices can appear in non- 
leaf positions. 

Another problem that paper [2;] fails to determine is the exact cost of a minimal colored cover 
for some cases. Here we give only one of them. (The references relate to the relevant sections 
of that paper.) Assume that the root of T has two neighbors: one is a grey leaf {g — 1), and 
the other one is a branching vertex. Furthermore assume that w is odd, and no white leaf is 
short. Then we are in the scope of Theorem 5 of 12(]. Since g is odd and Tc is a. fortress or 
junior fortress, we are to apply the case "otherwise" of Theorem 5. That formula now gives: 
cost(r) - w + \gl2'\ + \ - w + 2 while the proper cost is only w + \gl2'\ - w + \. 

Before we give our main result we introduce one more notion: when among the children of the 
root there is exactly one child that is not a grey leaf, then the (colored) vertices between the root 
and the first branching point are called dangerous. 

Theorem 11. Let T be a white-grey tree with g grey and w white leaves. Let T„ be derived from 
T by deleting the grey leaves (and the root if it would become a leaf). 
(l)IfT does not have any dangerous vertex then 



cost(r') = 



w + I, ifwis odd and there is no short leaf 
w, otherwise. 



(2) 



cost(r) = 



w + 1 + 1^^] , ifw is odd and there is no short leaf in T^; 
w + l^lj , otherwise. 
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(2) IfT has some dangerous vertices (and T^. has (w + 1) leaves) then 



cost(r) = ] (w;+l) + [f]. 



(w + 1) + 1 + max 



is odd and there is no short leaf in 7",^; 
(iw + V) is odd, there is only one short leaf in T, 
\and that leaf is white and dangerous; 
otherwise. 



Proof. (1) Assume that w is odd and there is no short leaf in T„. Then, due to Lemma [TOl 
cost(rH.) -w+ \ and the half-path of the solution is clearly free, so f - I. Otherwise cost(r„.) = 
w, but we have no free path at all, so / = 0. In both cases apply Lemma|9] 

(2) Case 1: Assume that (w+1) is odd and there is no shortleaf in r„>. Thencost(r„) = (w+l)+l. 
In the derived optimal cover !P„, of T^, a leaf-leaf type long path P covers the dangerous vertices. 
Then the path P and the half-path of are free, so / = 2. Then apply Lemma|9] 
Case 2: Assume now that (vv+ 1) is odd, there is only one short leaf { in T,,,, and that leaf is white 
and dangerous. Then cost(r„) = w + 1. Moreover, an optimal cover of !P„. should contain the 
short (non-free) path covering f, and there is no other free path, thus / = 0. Lemma |9] finishes 
the case. 

Case 3: The "otherwise" cases: Assume first that (w + 1) is odd, there is only one short leaf {, and 
that vertex is a grey (therefore also a dangerous) vertex. Then cost(r,v) = w + L Moreover, an 
optimal cover of r^, should contain the short path covering £. But { is grey, so the path covering 
i is free. Thus f - I- Then apply Lemma|9] 

Assume now that (w+ 1) is odd, and there is a short leaf { which is not dangerous. Then there 
is an optimal cover of Th, in which the dangerous vertices are covered by a long path P. Then P 
is free, so / > 1. Since cost(r„>) - {w + 1), in an optimal cover of r„. there is only one long 
path which is free because all long paths in Ph, contain two leaves. So / < 1, i.e. f - I- Now 
apply Lemma |9] 

Assume finally that (w + 1) is even. Then cost(r„) = (w + 1), in an optimal cover of T^. there 
is only one long path which is free because all long paths in an optimal cover should contain two 
leaves. So / < I. However, there is an optimal cover of r„, containing a long path which covers 
the dangerous vertices. So / > I. Thus f - I- Now apply Lemma|9] □ 
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